\section{The GPCG Algorithm}

\label{alg}

The GPCG algorithm uses a gradient projection method
to identify a face of the feasible region $ \Omega $
that contains the solution, and the conjugate gradient
method to search the face.
This section provides an outline of the algorithm and notes any
differences between our implementation and the implementation
of Mor\'e and Toraldo \cite{more-toraldo}.

Given $y_0=x_k$,
the gradient projection method generates a sequence of vectors $\{y_j\}$
in the feasible region $\Omega$ such that
\begin{equation} \label{next-y}
y_{j+1} = P [ y_j - \alpha_j \nabla q(y_j) ],  
\end{equation}
where $P$ is the projection
onto (\ref{def_bounds}), and
the step size $\alpha_j$ is chosen such that
\begin{equation}  \label{gplsstop}
 q(y_{j+1}) \leq q(y_j) + \mu
\langle \nabla q(y_j), P [ y_j - \alpha_j \nabla q(y_j) ] - y_j \rangle
\end{equation}
for some $\mu \in (0, 1/2 )$.
The projection $P$ can be computed in $n$ operations by
\[ 
P[x] = \mbox{ mid} (l,u,x), 
\]
where $\mbox{ mid} (l,u,x)$ is the vector whose $i${th} component
is the median of the set $\{ l_i, u_i, x_i \} $.
The step size is computed by a
projected search \cite{more-toraldo} by setting $ \alpha_j $
to the first member of the sequence
$ \alpha_0 ( \half ) ^ j $ for $ j = 0, 1, \ldots $ such that
$ y_{j+1} $ satisfies the sufficient decrease condition \Ref{gplsstop}.
In our implementation, we use
\begin{equation} \label{bqpls}
\alpha_{0}= \arg \min 
\left \{ 
q \left (y_j - \alpha  \nabla_{\Omega} q(y_j) \right ):
\alpha > 0 
\right \} .
\end{equation}
Computation of $ \alpha_0 $ is straightforward, since 
the mapping 
$ \alpha \mapsto 
q \left (y_j - \alpha  \nabla_{\Omega} q(y_j) \right ) $ is
a quadratic.

We generate gradient projection
iterates until sufficient progress is not made or
the active set settles down.
Thus, we generate iterates until either
\begin{equation} 
\label{pgstop1}
 {\cal A}(y_j) = {\cal A}(y_{j-1})
\end{equation}
or the condition
\begin{equation} 
\label{pgstop2}
 q(y_{j-1}) - q(y_j) \leq \eta_1 \max \{q(y_{l-1}) - q(y_l) : 1 \leq l < j \},
\end{equation}
holds for some tolerance $ \eta_1 $ in $ (0,1) $.
If either test is satisfied, we proceed to the
conjugate gradient part of the algorithm.

The first test (\ref{pgstop1}) measures when the
active set settles down. For nondegenerate problems, 
(\ref{pgstop1}) holds in a neighborhood of the solution.
The gradient
projection could be followed until the optimal face is found, but
experience has shown that a large number of iterates may be required.
The second test (\ref{pgstop2}) measures when
the gradient projection method is not making sufficient progress.

Given an iterate $x_k$ and the active set  ${\cal A}(x_k)$,
the conjugate gradient method computes an approximate
minimizer to the subproblem
\begin{equation} \label{cg}
\min \{ q(x_k + d): d_i = 0, i \in {\cal A}(x_k) \}.
\end{equation}
This problem is unconstrained in the free variables.  Note that
if $x_k$ lies in the same face as the solution and $d_k$ solves
(\ref{cg}), then $x_k + d_k$ is the solution of (\ref{def_bqp}).

The conjugate gradient algorithm for solving
\Ref {cg} is implemented by expressing
this subproblem in terms of an equivalent subproblem in the
free variables.
Let $ i_1, \ldots, i_{m_k} $
be the indices of the free variables, and let
the matrix $Z_k$ be the matrix in $\R^{n \times m_k} $ whose
$j$th column is the $i_j$th column of the identity matrix in
$\R^{n \times n}$. With this notation we see that
subproblem \Ref{cg} is equivalent to the
unconstrained subproblem
\begin{equation}
\min \{ q _ k ( w ) : w \in \R ^ { m_k } \}  ,
\label{cg2}
\end{equation}
where
\[
q_k(w) \equiv q ( x _ k + Z _ k  w ) - q ( x_k ) =
\half \langle w , {A_k} w \rangle + \langle r_k , w \rangle  .
\]
The matrix $ A_k $ and the vector $ r_k $ are, respectively, the
reduced Hessian matrix of $q$ and reduced gradient of $q$ at $x_k$
with respect to the free variables.
If $A$ is the Hessian matrix of the quadratic $q$, then
\[
A_k = Z_k^T A Z_k , \qquad r_k = Z_k^T \nabla q(x_k)  .
\]
Also note that $A_k$ is the matrix obtained from
$A$ by taking those rows and columns whose indices correspond
to free variables;
similarly, $r_k$ is obtained from $\nabla q(x_k)$ by
taking the components whose indices correspond to free variables.

Given a starting point $ w_0 \in \R^{m_k} $, the conjugate gradient algorithm
generates a sequence of iterates $ w _ 0 , w_1 , \ldots $
that terminates at a solution
of subproblem \Ref {cg2} in at most $m_k$ iterations.
We use the conjugate gradient algorithm 
until it generates $ w _ j $ such that
\begin{equation}
q_k ( w_{j-1} ) - q_k ( w _ {j} ) \le \eta_2 \max \{
q_k ( w_{l-1} ) - q_k ( w _ {l} ) : 1 \le l \lt j \}
\label{cgstop1}
\end{equation}
for some tolerance $ \eta_2 > 0 $. 
The approximate 
solution of \Ref {cg} is then
$ d _ k = Z_k w _ {j_k} $, where $j_k$ is the first index $j$
that satisfies \Ref {cgstop1}.

The termination test \Ref{cgstop1} is not standard.
Iterative solvers usually terminate when
\[
\| r_j + A_j w_j \| \le \eta_2 \|r_j\| 
\]
for some tolerance $\eta_2 \in (0,1) $.
This test suffers from the erratic behavior
of the residual $ \| r_j + A_j w_j \| $.
On the other hand, the termination test \Ref{cgstop1}
depends on  whether the conjugate gradient method
is making sufficient progress.

Given the direction $ d_k $, we use a projected
search \cite{more-toraldo} to define 
$ x_{k+1} = P [ x_k + \alpha_k d_k] $, where
$ \alpha_k $ is the first element in
the sequence $ ( \half ) ^ k $ for $ k = 0, 1, \ldots $ such that
\begin{equation}  \label{cglsstop2}
 q( x_{k+1} ) \le  q(x_k) + \mu
\langle \nabla q(x_k), x_{k+1} - x_k \rangle .
\end{equation}
More sophisticated projected searches are possible \cite{more-toraldo}
,
but this simple search has proved to be sufficient in all cases tried.
If
\begin{equation}
\label{cgstop3}
\cB ( x_{k+1} ) = \cA ( x_{k+1} ) ,
\end{equation}
then we find a more
accurate solution to subproblem \Ref{cg2} by reducing $ \eta_2 $
and continuing with the conjugate gradient method.
Otherwise, we terminate this iteration.

\begin{Algorithm}
\noindent{\bf Algorithm GPCG}
\begin{list}{}
{
\setlength{\parsep}{0pt}
\setlength{\itemsep}{0pt}
\setlength{\topsep}{0pt}
}
\item[]
Choose $ x_0 \in \Omega $.
\item[]
For $ k = 0, \ldots, $
\begin{list}{$\bullet$}
{
% \setlength{\parsep}{0pt}
% \setlength{\itemsep}{0pt}
% \setlength{\topsep}{0pt}
}
\item[]
Set $y_0 = x_k$, and generate gradient projection iterates
$y_1, \ldots, y_{j_k}$, where $j_k$ is the first index to satisfy
(\ref{pgstop1}) or (\ref{pgstop2}). 
Set $x_k= y_{j_k}$.

\item[]
Set $ w_0 = 0 $, and
generate conjugate gradient iterates $ w_1 , \ldots, w_{j_k} $
for the reduced system (\ref{cg}).
Set $ d _ k = Z_k w _ {j_k} $, where $j_k$ is the first index
that satisfies \Ref {cgstop1}.
\item[]
Use a projected search to generate $ x_{k+1} $.
If \Ref{cgstop3} holds, reduce $\eta_2$, and
continue with the conjugate gradient method.
\end{list}
\end{list}
\end{Algorithm}

Our outline of algorithm GPCG does not include the termination test.
An advantage of the termination test \Ref{bqp_approximate_sol} is 
that this test is satisfied \cite{JVB92}
in a finite number of iterations.
On nondegenerate problems GPCG terminates \cite{more-toraldo}
at the solution in a finite number of iterations.

Algorithm GPCG is suitable for large problems.
As opposed to some other active set methods,
each iteration is capable of
adding or removing multiple constraints from the active set.
Moreover, as we shall see, GPCG tends to require few iterations
for convergence.
Another advantage of the GPCG algorithm is that convergence
can be achieved while requiring only approximate solutions
to the linear systems.



